Search CORE

28 research outputs found

Bi-Modal Person Recognition on a Mobile Phone: using mobile phone data

Author: Bonastre Jean-François
Cernocky Jan
Cootes Timothy
Hadid Abdenour
Kittler J.
Larcher Anthony
Levy Christophe
Marcel Sébastien
Matejka Pavel
Matrouf Driss
McCool Chris
Pietikainen Matti
Poh Norman
Tresadern Phil
Publication venue
Publication date: 19/12/2013
Field of study

This paper presents a novel fully automatic bi-modal, face and speaker, recognition system which runs in real-time on a mobile phone. The implemented system runs in real-time on a Nokia N900 and demonstrates the feasibility of performing both automatic face and speaker recognition on a mobile phone. We evaluate this recognition system on a novel publicly-available mobile phone database and provide a well defined evaluation protocol. This database was captured almost exclusively using mobile phones and aims to improve research into deploying biometric techniques to mobile devices. We show, on this mobile phone database, that face and speaker recognition can be performed in a mobile environment and using score fusion can improve the performance by more than 25% in terms of error rates

Infoscience - École polytechnique fédérale de Lausanne

Automatically Derived Units in the Speech Processing

Author: Cernocky J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/04/1999
Field of study

Current systems for recognition, synthesis, very low bit-rate (VLBR) coding and text-independent speaker verification rely on sub-word units determined using phonetic knowledge. This paper presents an alternative to this approach - determination of speech units using AUSP (Automatic Language Independent Speech Processing) tools. Experimental results for speaker-dependent VLBR coding are reported on two databases: average rate of 120 bps for unit encoding was achieved. In verification, this approach was tested during 1998's NIST-NSA evaluation campaign with a MLP-based scoring system

Directory of Open Access Journals

Digital library of Brno University of Technology

Audio Surveillance through Known Event Classiﬁcation

Author: F. Grezl
J. Cernocky
Publication venue: Spolecnost pro radioelektronicke inzenyrstvi
Publication date: 01/12/2009
Field of study

The way of audio surveillance through known event classiﬁcation is presented introducing simple yet efﬁcient framework. The use of the proposed system for unknown event detection is also suggested and evaluated. Further, a speciﬁc audio event is detected with use of audio classiﬁcation, which helps the detection to focus on a signal of speciﬁc behavior. Thus it is shown that the system can be used in several applications

Directory of Open Access Journals

Speech spectrum representation and coding using multigrams with distance

Author: Baudoin G.
Cernocky J.
Chollet G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2002
Field of study

International audienc

HAL - UPEC / UPEM

Diphone-like units without phonemes - option for very low bit rate speech coding

Author: Baudoin G.
Cernocky J.
Motlicek P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2002
Field of study

International audienc

HAL - UPEC / UPEM

Text-independent speaker verification using automatically labelled acoustic segments

Author: Cernocky J.
Chollet G.
Hennebert J.
Petrovska-Delacrétaz D.
Publication venue
Publication date: 03/12/2004
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Regularized subspace n-gram model for phonotactic iVector extraction

Author: Burget L.
Cernocky J.
Cumani Sandro
Plchot O.
Soufifar M.
Publication venue
Publication date: 01/01/2013
Field of study

Phonotactic language identification (LID) by means of n-gram statistics and discriminative classifiers is a popular approach for the LID problem. Low-dimensional representation of the n-gram statistics leads to the use of more diverse and efficient machine learning techniques in the LID. Recently, we proposed phototactic iVector as a low-dimensional representation of the n-gram statistics. In this work, an enhanced modeling of the n-gram probabilities along with regularized parameter estimation is proposed. The proposed model consistently improves the LID system performance over all conditions up to 15% relative to the previous state of the art system. The new model also alleviates memory requirement of the iVector extraction and helps to speed up subspace training. Results are presented in terms of Cavg over NIST LRE2009 evaluation set

PORTO Publications Open Repository TOrino

SpeechDat-E: Five Eastern European Speech Databases for Voice-Operated Teleservices Completed

Author: Backcsi Z.
Boudy J.
Cernocky J.
Galunov V.
Heuvel H. van den
Kochanina J.
Majewski W.
Pollak P.
Rusko M.
Sadowski J.
Staroniew P.
Tropf H.S.
Publication venue: Aalborg, Denmark : [S.n.]
Publication date: 01/01/2001
Field of study

Contains fulltext : 76438.pdf (author's version ) (Open Access)4 p

Radboud Repository

Regularized subspace n-gram model for phonotactic iVector extraction

Author: Burget L.
Cernocky J.
Cumani S.
Plchot O.
Soufifar M.
Publication venue
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)